BO 156 69t 

■aOTHOE 
II TLB 

FOB DATE 

.HOSE 



BDBS PBICP 
DESCBIPTOBS 



IUES.TIFIEBS 



OOCOailt BESOBS V 

' ' \ *■ . . 

/IB 007 lOi 

Baldauf, Eichara B., Jr. • \ ' 

Evaluation Hodels an^d iBstrusentxaticn: PrcbleiBs for 
Title r in iae^ica*£ Eacific Possessions. 
-Pei 78 

25p«;' Paper presented at the Irost Territory Title I 
Conference (February 9-14, 1S78^' 

*> 

i!F-$0.'83 flC-$1.67 Plus Postage. 

Cloze Propedaxe; Cost Iffectiveness; Cecision flaking; , 
Eleaentary Secondary Education; "^Englisb (Second^ 
Language); ^Evaluation Hetfaods; ^Hcdels; ^Mcraa^ ^ 
♦Prografi Evaluation; Standardized Tests; Systeas 
Approach; Test Construction; Test Selection 
"^Eleaentarv Secondary Education .Act Title Local 
Hons Model; ♦Pacific Trust Territory 



ABSTEACT ' ^ , ^ - 

An overview 
evaluation and the psychdse 
Elementary ahd Secondary £d 
It is argued that, in Aseri 
the systems analysis approa 
inforaation .to questions' of 
second language program dev 
proposed BOdels, although v 
recent years, are often ina 
correcfi^ and due to the c 
pirogras iv%tlity. Tests desi 
States are generally inappr 
second language. Three node 
Corporition for Titles I eva 
norn- referenced sodel; (2) 
special^ regression sodel. T 
,the us'e of standardized tes 
of cloze teats irith local n 



of the system analysis approach to 
trie models ccrrertly proposed to evaluate 
ucation Act little I projects is presented. 
ca*s Pacific* possessions, the sole use of 
ch to evaluation fails to provide adeguate 

central itportance for English a£t a ' 
elopment. In this particular setting, the 
ell researched and considerafclj refined ^in 
ppropriate and difficult tc implement 
ost factor they may actually detract from* 
gned for use in the continental United 
opriate for islands where English is a 
Is developed by the EHC Besearch 
luatioa are discussed: (1) the 
the control group model; and (3)' the 
ffo preferable evaluation models would be 
ts with local norms, and the development 
orms . ( Aut hor/CTH ) 




♦ i(e productions supplied by EOBS are the best that can made ♦ 

♦ ' from the original documentt ♦ 



EVALUATIOII-L'ODEtS AMD IKSTHUISliSAHON: ' 
PROBIEL'S FOR O^ITIE I III- AIJSRICA'S PACIFIC POSSESSIOirS^ 



" -p mr***^'^^'*^ OF HEALTH 

NATIONAL INSTITUTE OF • 
eOilCATIOH _ 

»H6 PERSON OP ORGAN?,^^"'^'' '"O-^ 
ATtNC .T PO.NTS OP^^C^;? '°''°'"O"'- 



Richard B. Baldauf, Jr. 
■James Cook University of north Queensland 
Towhsviile , Austi*al 

iO,^ ^ "PERMISSION TO REPRODUCE THIS 

" MATERIAL HAS BEEN GRANTED BY 

PirUr^ .ft. 



LUl 



CO 

o 



ERIC 



ABSTRACT 



■"-5 — t^Uni/-/^^ 'r^fN 

TCr THE EDUCATIONAL RESOURCES 



- -RESOURCES 
INFORMATION CENTER (ERIC) AWD 
. . ■ . ' USERS OF THE ERIC'SYSTEM"' 'iiL. 

An overview of the systems analysis approach to 
evaluation and the psychometric models currently oroposed ' 
to evaluate ESEA Title I projects is presented. ' It is 
argued that, in Americans Pacific .possessions, the sole , ^ 
■ use of the systems, analysis approach to evaluation fails 
to provide adequate infoni;ation to questions of cental 
'importance. for English-second-language program development. ' 
In this particular setting, the proposed models, although 
Well researched and considerably refined in recent years, 
are often inappropriate, difficult to Implement correctly, 
and due to the cost factor may actually detract from 
program quality. A f ur the V complicating factor 'is the 
lack of adequate instruments (tests) for the models to be 
validly implemented. Reasons for these problems are 
detailed and an alternative model and some possible 
solutions to the instrumentation problem are suggested. 
The paper recommends that allowance be made for a -wider 

variety of evaluation approaches, in th^ final Title I- 

/ . 

guideline! . 



Paper presented to the Trust Territory Title I 
Conference, February 9 - 14,^1978. This -research 
was supported^ by a University' Itesearch Grant^from 
•fr^^n ^ooj •^■ri^versity, and by funds provided by the' 
U.S.O.E. to the i-,:ariana Islands under E3IA Title III. 



The^ opinions stated in the pap-er are those of the 

endorsement of the work by the U'.S.O.S. 



author and no 
is implied. 



. ( ... ^ 

EVAIUATlON MODELS AND INSTRUMEFTATIOIT: * 
PROBIEKS FOR TITLE^I IN /iLIERICA'S ?Abr?IC POSSESSIOIIS 



Richard B.' Baldauf, .Jr. ' 
James Cook University North Queensland ' " 



Ihe systems analysis approach ( l!c laugh 1 in, 1975) is 
. "one of ;a number-of strategies which have been proposed * 

for ahd used in educational program evaluation (i.e. 
, Worthen and Sanders, 1973). It is' particularl/important 
however in the Ameripan context because "it has served 
as the mador evaluation perspective in the United States 
Department of Health, Education and V/elfare since about 
1965 (House, 1977, p.l)." In recent years, a particular ' 
subset of evaluation strategies has been, developing Within 
the general systems analysis framework, and these models 
may soon become th^ required evaluation and reporting 
methods for all Elementary and Secondary Education Act 
(ESEA) Title I projects. These projects,^ which focus on 
compensatory instruction in the basic skill areas of 
reading and mathematics, involve large scale funding of 
programs in most of the 50 states and in^America's 
Ps^cific- possessions. Thus, the potential widespread use ' 
of the proposed models and their ijaplications for program 
evaluation make them ones with which program managers 
and. evaluate rs should be familiar. 

The Syst ems Analysis AT?proach , • " 

A system analysis approach to evaluation ie based 
on relating quantitative output measures, usually, test 



"scores, to program differences (House, 1377). Thi? has 
most often been accomplished through the use of correo- 
tional techniques, but increa'singly has involved' the use 
of exper.imental design (i.e. Marco, Llurphy & Quirk, 1976) 

.The key cpncepts in the systems analysis approach as put 
forward by its leading proponent Alice M. Rivlin (1971) 

••^la^te-been^eumiariz^ by House (1977, p. 8) as follows: 

- Key decisions will be made by higher 
governmental I'evels. 

- The end of evaluation is .efficiency in the 
production of social services. 

- The. Only tru^ .kmov/ledge is a production 
function specifying stable relationshlos 
beiJween educational inputs and outputs^.' 

* 

- The only' way to such knov/ledge is through 
experimental methods and statistical tech- 
niques. 

- It is possible to agree on goals and oh a 
few output measures.' 

♦ - There is a direct parallel between oroduction 

in social services and in manufacturing. 
* The same techniques of analysis will apply. ' 

The aim of the evaluation process in the systems ~ 
analysis approach is to provide generalisations that will 
hold- in various circumstances. Large samples-are required 
both to detail the range of circumstances, and to rid the 
final production function of idiosyncra.cies. The final 
product will enable the major consumers' of evaluation, 
the. managers and adminis,trators 'bf governmental programs, 
to produce social services more effectively '(House, 1977). 

Background to the Title I Evaluation System 

Title I of ESEA currently provides over 52 billion , 
annually to educational agencies, se]*ving' approximately . . 



9 million students in ar^as 7;i?k concentrations of children 
from low income families, to provide remedial services in 
. basic skills. Although the^peportin'g procedures for such 
grants provided fo^ appropr4ate, objective measurement to " 
be submitted on at least an annual basis, many of the • . 
reports submitted, at the federal level have shqv/n a., "lack ' 
of. comparability — and often validity — of t^ie data 
in them (Anderson, 1977, p. 2)." Kif need for betfer" 
evaluation, procedures was further documented by a. litefa- . 
ture search which - ' 

A 

endompassed some 2,000 projects, all of which 
had received some f oria" of "official"' recoo-- 
• . nition for success. Of the 2,000, only six 
.could be found which, under close scrutiny, 
• -v -were able to meet the selection'criteria 

of effectiveness, cost; availability and 
replicability established for 'this search, 
liost discouraging, however, was' the fact 
that not one of the -evaluations Tjrovided 
acceptable evidence regarding project success 
or failure. . In all cases, problems in con- 
duetmg and reporting the evaluations rendered 
' the results inconclusive (Horst, Tallmadge & 
WoQd, 1974, 1-2). ^ 

, , In an effort to insure a greater degree of account- 
ability, the United -States Congress passed secj;ion 151 
of [the Title- I Act in August, 1974. ^der this section 
the;United States Office of EducatM (USOE) was required * 
to 'implement a 



. complete evaluation program: conduct 
' evaluations, upgrade, evaluation actiritiea ' 
at other administrative levels so- that 
reported data are comparable, use those 
data to --- among other things identify 
especially effective instructional Dractices, 
and disseminate information about those ' " 

practices (Anderson, 1977, p. 3), 

^Under this legislative mandate^ a series of contracts 
were awarded to the R}£C Corporation. The first provided 



4 



ERIC 



for a review .of Title I program evaluatiOA, recommended « 
comifion reporting practices and checked on the feasibility 
of ■ thd resulting suggestions v/ith a sm^all sample of . ' 
administrators- and evaltiators (Gamel, , Tallmadge>^iVood, 
and Binkley, 1975). -.The second involved, all states and 
territories aijd.some local school districts in a discussion 
of a prototype system' and its' implications SbT'th'ose 
settings (Bessey, .Rosen, 'Chiang, and Tallmadg^, 197^. 

Prom these discussions have emerged thi^ee structured 
•[evaluation models, with thg- provision for the use of 
^ other models if they can provide comparab-le data. The 
^ y system is expected to allow ^or "the aggregation of 
unbiased, pTd.ject -valid estimates of the effects of . 
Title I services^ ... expressed in a common metric 'to 
make such ag'gregation possible (Anderson, , 1977, p.ll)."^ 
Besides pr(ividing for aggregation, the system is expected - 
to be use'ful to identify especially effective educational 
programs, to facilitate the monitoring and guidande of 
^ progr^s, and to help upgrade evaluation activities, 

, "Hence, our hopes for. the system' are that decision makers 
at all levels will find benefits from its use (Anderson, 
1977, p. 13.).'; Local education authorities (LEA's) are 
expected 'to begin implementation of thes/ models in the 
1979-1980 school year. 

i 

The Evalugition Models - - " . ' 

The evaluation models fiarst appeared in October 1974 
as an RMC report ^entitled Lleaauring: Achievement Gains in ' 
Educational Prelects (Horst, et.al, 1974)* The Government 
-Printing Office version is reported to .h^ve sold over 8000 



cc^'ies during its first year in print (Anderson, 1977). 

As indicated in the preceeding section,' three models have 

■ • I- 

beeQ selepteti /rom this initial rep'ort and further -refined 

for use ?;ith Title I prajects. 

•Th^se designs are: Model A, the norm- 
Referenced L:odel; IJodel. 3, the Conlrol 
Group It'odelj and ilodel C, the Special 
Regression Kodel. * Further flexibility ' 
is afforded in that each design has 
variations to accomodate, the use of 
either normed achievement tests (Uodels 
A1, B1 and Cl)^or tests for which norma- 
tive data are not available (Uodels A2r 
B2, and C2) (Tallmadge & V/ood, 1976). 

Model A, the Norm-Referenced Model, assesses gain 
by comparing the pre- and .posttest percentile status of - 
svudents either directly with national norms (iJodel Al) 
or indirectly by using the equipercentile method to 
equate non-normed test scores with a normed test' given 
at the same time as the pretest (Model A2). In addition, 
the model requires that (a)' pre-posttesting must occur 
on the empirical normative dat^s, (b) the samot-.level and 
form of the t^st must be used for pre- and ptfsWsts, a^Q- ' 
(c) pretesting must otcur after project sample -selection 
to avoid regression effects (Tallmadge et al, 1976). 

Model B, the Control' Groiip Model, requires that pre- 
posttest data be collected at the' same ; time for both the 
Title I treatment and for tjie random or "random in- • 
effect", control groups. . /Post hoc matching procedures 
are not allowed, but either an analysis of covariance ' 
(assuming random assignment) or a standardized-gain-score 
method of adjustment (a^quming different populations) may. 
be used to adjust for pre-test differences. In addition. 



• all . supplementary^ instructional services must be withheld 
from the comparison grouL Although testing dates "are ^ 
flexible and difgefent tists or levels of the same test 
may be used, the treatmen|: group must take a nationally 
normed test sometime during the year (Tallmadge et al, 
1976). . / 

' Model -C, -the Special Regression-L'odel,- incorporates 
two different evaluation designs, the regression-discon- ' 
tinu^li-ty design (Campbeil and Stanley', 1963) and "the 
, regress-ion-prooection design -(Horst et al,'l974). Pre- 
testing must include the entire group from which the * 
treatment "and coapa'rison groups will be formed. Students 
are. assigned to the treatment group on]^ on the basis of^ 
the pretest cut off score while comparison group students 
may not receive an^ special instructional services. Care - 
must be taken not to remove slow or disruptive students 
as this may invalidate the evaluation process. ^Ideally 
about 50" treatment and 100 comparison pupils should be ' 
used to implement the -model. The treatment group must 
take a nationally norqed test sometime during the year. 
However, different pre-posttests may be Used if they 
correlate highly (.40 or better) (Tallmadge et al, 1976). 

The models .discussed in the proceeding paragraphs 
are recommended for implementation "in a hierarchy based 
on technical, desirability" (L'odel B, then Llodel C, theii 
Model A), however "choosing a model will always involve 
making trade offs between technical and practical consider- 
ations (Tallmadge et al^ 1976, p:i9)." 'pigure 1 summarizes 
the models and indicates the key decisian points for ' 



•model selection. ' ^ 



I 

i 



< ■■ ■ 



Insert Figure l About Here 



Fina^Lly,- regardless of the model selected, the 
resui^s are converted to Normal Curve Equivalents (NCE's) 
to provide for comparisons across. Title I projects. ^ This 
last point is important because it emphasizes the under- 
lying systems analysis basis of the evaluation 'system 
which is outcome rather than process oriented particularly 
at 'the project ligvel. The system's originators hov/ever ' 
take th,e point pf view that 

the data* called for by the proposed system' 
wiLl do more than provide evidence regarding 
overall effectiveness of the Title I program.^ 
The system Xvill permit analysis of project- 
' ' le.^gl relationships among cost, achie-\fement 

gains, hours of intervention, grade levels, ^ 
instructor pupil ratios, and initial degree 
of^ educational need. It will, then, enable 
investigation of most of the major and minor 
concerns expressed by educational Dolicy 
makers interviewed, during Phase. I of the study 
(Tallmadge Vt al, 1976, p.t2 - draft version). 

Applying the Models in America's Pacl/ic Possessions 

IThe work discussed thus far represents an impressive 
30b of synthesizing empirical evaluation designs and 
' detailing models for a v/lde variety of evaluat.ion needs. 
Undoubtedly, evaluation has developed to a point where 
attemiits at synthesis should be encouraged (Gepha^, 
1977). Certainly, the need for a v/ider varieity of valid 
and comparable evaluation studies' is evident. H.ov/ever, _ 
tb make the evaluation models discusset^ in the proceeding 
paragraphs mandatory for all agencies v/hc> wish to receive' 



ERLC 



V 



Control Group 



/> 



Question 1 

. Can an Appro- 
priate Control 

'Group be 
Found? 



No 



Question 2 
Are Model B 
> Requirements 
^ Feasible, 
Acceptable? 




No ^ 


> 

1 2, 





Question 3 
- Are Enough 
Pupils Avail- 
able for 
Implementing 
Model C? 



Yes 



Question 4 
Are Model C 
Requirements 
Feasible, 
Acceptable? 



No 





No 


5 


■Question 5 
Are Model A 
Requir^ements * 
Feasible, 
Acceptable? 



No 




National 
Norms 



Non- 

Jlomed' 



National'' 
Noinns 



Non- 
Normed 



National' 
Norms 



Non- 
Normed 



Figure 1. Decision t^e for selecting an evaluation model. 
Tallmadge .et al, 1976, p. 20) 



(Adapted from 



ERIC 



00 



11 



Title I funding, eVen if none otf .the ^models is " appropriate, 

" • 

^is not only, bureaucratic mismanagement, but introduces ' ■.' 
error, in the forifi.of invalid summary, results, ^ in to the - " 
aggregation system which' is centJdl' <to . the 'proposed ' 
system.- .This in turn undermines the usefulness and 
validity of generalizations obiained.^ The following " 
sections attempt to validate the 'hypothesis that none 
of the models as they stand is^ normally, ^propri ate for-' ; 
evaluating -Title I English-second-language programs in' 
America's Pacific territories., , V . 



Selection of A Model 



ERIC 



In the Pacific islaiid setting, where schools ten^ ^ 
to be small, where communities .even within a culture are 
, relatively unique, where c'ommunicatdon is diffibult, and 
•where the majoriifcy ^of students only begin to .study and 
speak English when they come to school, the evaluat'or ' 
has few options when it comes to model selection. This - 
.statement becomes clearer'when the five questions (see " " 
Pigt^e jO posed to aid in the selection of an .evaluation ^ 
model are re'viewed. • 

■ Can a suitable control ^roup be ■fnnTi^9 Because 

of community differences, cultural differences, different 
degrees of. exposure to the English l^guageand to western 
culture, and the fact that there are " only a small number/, 
of schools, usually with one class to a -grade, \t is' ' ' 
difficult to find control groups except in a few^.of the- 
larger, mainly secondary schools. 

■ V ^' Can the reouirenent s imposed by I.Iodel B b^ mpt 
and are they acceotablp^ in the few cases where a • 



treatment and control group selected the same mannef 
can-be found, the' use, of Lodel B is oft*i not appropriate , 
bec^se of the fequireraent'vthat gtuSents in the, control 
groupyay .not receive an^ supplementary Instructional 
services. In most cases, students in Americans Pacific . 
„^egi9i^...school|\_are._Wai^^ .in. one or . mftre-corapensatary * 
•programs. In American Samoa for examp'le., aia students 
in grades 8,- 12, except a few 'exempted first lajigua^ 
students, were involved' in Title I SSi programs, 
similarity of bilingual and some Title III program/in ' 
method andl approach to -Title I further ^omplicat^ the^ 
use of-Uodel B. -These group differences and the probleias • 
^M^^^^""^ overlap make it difficult to meet the require- 
ments necessary foi^ the control group evaluation stl-.ategy 
to be implemented. *^ 

. 5- At-e.there enourfi pa rticioants and non-oarticipants 
at each >;;rade level to enable imple>nentation 'of Llodel C? 
Adequate numbers are a problem to some degree with this 
approach particularly at the elementary school level.. 
Since students must be assigned based on pretest scores ' 
to either^ the treatment or' control group , students must 
come<frora one school so that a* rearraiigement of classe«>^ ' 
ife possible. Small, one class per 'grade, schools are ^ 
there by. effectively excluded from Title I programs and 
the problem erf over testing and multiple program .inter- ^ 
action is increased in the. larger schools • 

4* Cam the reguireme n.ts impoged by I^odel C be met. 
and are they accetp^l-e? ' Although this Model C has been*' 
used as the best available in the ciccurastances for 
evaluation 1^ the. Trust IBerpritoty, it has some of the 



■ • . . • ■ p 

same prob^emp, as-'Mod^i B. In pattidular, it<3hay be 
diTficujM; to meet the requirement that all compensator^r^ 
Be'ryi'ces^he withheld from the' control, group. 

, 5. • Oan.'therseQuirements luposed by: Ljpdel A be met, 

■ Qgd^are t^^^ceptable? Model A requires the admiiiis- . 

' tration oc ^ ^ K east one appropr:|ate norraed-ref ei^ericed 
teat.' TheT^ curre:fttly are no n-pnn-ref erenced tfests" 

• aiVailableSe^r .ESl^j students and so the Ivlodel can not be 
validly implemented. The question of valid .tests is one 
v/hich applies, of course, to all^ the" models and because 
of its .importance vdll be examined in' detail in a separate 
section. • ■" j - 

Implementation Problems *^ 

Although the us6 of one of the models is/will be 
a Title I evaluation requirement, it is bnly in rare 
circumstances that any of the models may be used in 
America's Pacific possessions validly without violating 
some major evaluation de&ign a'S^umption.^Thus, without 
even considering the problem of what to' do to obtain 
Valid instrumentation, it is difficult to find an 
appropriate Title I evaluation design. 

The use of the models and their proper implementation 
Is furth'er complicated by the fact that the .Pacific 
territories have ^unique problems when compa2:|d to other 
Title I program?* These include vast distances, few 
trained personnel, high staff turnover, raore-^ limited 
access to external consultants, and higher charges for f 
eValuation,^aterials arid services." These problems are 
compounded by the Title I programs' comparatively small 



> '■ ■ i 

budgets, based on small numbers of pupilf served. 
Madden (1976) has pointed out some of the potential - 
implementation problems inherent in the models and has 
emphasized the need for greater training and support 
programs by -the USOE. ... 



Local ITonns ' • ' ' . ■ " 

It migh^ be argued that many of the above, problems , 
can be solved by using tests with local norms. Although 
this is a step in the right direction in- some 'respects, ' 
it^ does not help the evaluator to f iid a vali^ model since 
our previous ^alysis has sho^ that the models are 
usually invalid. Regardless of the type of ^testing selected. 
Purthennore, to use this approach it is necessary to ' 
give* a "nationally normed test" in order to implement * 
any of the three models (Tallmadge et al, 4976, p-.t4). 
^^we shall see, ther» are no valid nationally normed 

'j|ts for use with ESL students, and so on the' basis 'of 
current test availability, it is nef possibli to 
correctly implement any of the models using Xocal norms. 

An Alternative r-Iodel 

For program evaluation to be valid, project students' 
must be comparable to control stu'dents or to norms for 
students with _similar characteristi'^s. For the ESL 
students, thiji means they must be compared with students* ' 
with- second language bacirgrOunds . There are few such 
students in ^American standardized test norming samples.' 
Further, there is some evidence to indicate that ESL 
students- in America's Pacific territories have weaker 



English language >kais than Native ^^eric^ ESL students 
in the States (Baldauf, 1976), due perhaps to the fact ' 
that there is less opportunity, for and ne'ed to use 
English in Pacific cultural settings,^^ Thu^,' Pacific 
island children-'^" linguistic needs an§ pxo^blems are not 
comparable to the ina;jority of Title. I students. 



:-^es-e factors suggest the need for a-n^vr-model which 
allows fpr these problems. I- would propose the use of 
the Local Norms Model, which follows ilodel A and is the 

. simplest of the evaluation models -to implement. The 
local Horns Llodel ;s70uld use' as a stapd^rd of -comparison - 

^locally developed norms based oir representative samples 
of the relevant population. 5uoh an approach would allo^^- 
evaluation model assumptions to be. met and v/ould remove 
the- requirement to compare second language students to 
first language norms. " Program, evaluation w«uld be greatly 

progress would be measured against 
an appropriajte standW* , 

Meeting Additional Evaluation Needs 

I believe a Local Norms Model would have additional 
benefits as well. i^Si* p'i'1fg5:ams,'.such-^ as those ^ in the " 
Trust Territory develope^^to iieet the unique heeds, of • 
small cultural groups, require detailed feedback on "how" 
and "why" a progpara has. sucoeisded, or fajyted, if adjust- 
ments to the program ari to be m^de and if trie standard 
of education is to be ' impftfved. The systems analysis 
model, which Anderson (1977) and' Talmadge, et al (1^76)' 
conclude provide usefut results for- program managers, 
does not, from my own experience provide' the data 



necessary to improve ESL programs., I<Iayer (1975) goes' 
further than -this to suggest th^t the standardized 
■ testing necessary to the systems analysis model l^s 
tended to . divert ' attention froin the theoretical problems 
in reading and lessen the effectiveness of compensatory 
ediication. The' systems analysis approach ha's,- little 
Ouiiutjxji fur the theoretical'Tssues on which -learning ' V 
depends. Kather*', 'it concentrates on collecting produc- 
tion like results, useful for governmental administrators - 
who need to know "if" a program is working, "for whom", 

and at vihat cost, ' ^ • . . ' 

^ \ '■ ' 

I believe the Local ^orms llodei would provide the 
USOE, with the .comparable data necessary to measure the 
effectiveness of ^ Title I ESL programs. In additicji, the 
simplified data collection and analysis proj^edures' and ' 

ability to use relevant local te^ts would provide 
additional time and resources necessary to undertake -.the 
formative evaluation activities required for 'program 
improvement. However, the implementation of this approach 
to evaluation depends directly on ^development of adequate, 
'locally normed tests* _ - ^ 

t ' * 

Test Selection/Construction 

The problem of instrumentation is a difficul-t ,one 

for prog^ evaluation in general. Tallmadge and Horst, 

two of the researchers most responsible for the current 

Title I evaluation guidelines indicate that ins t rumen- ^ 

tation, the . selection of. valid measures of achievement, 

...is currently the weakest linlc in the * 
•educational evaluation chain-. Until it 



•1 



\ 



ERLC 



• is strengthened, the information available 
for m&Jcing policy decisions will -continue ' . 
to be of marginal* utility (1977,- p-U). 

» ' _ , • ' • ' 

The problem is even greater in the Trust Territory 
because the lack of valid ESI. tests. A number of ; 
approaches to the selection and development of. valid 
measures of achievement ' are possible and ea6h of these' 
alternatives will be examln'ed in .turn* 

standardized BSL Tests • " 

* Buros (1974) in Tests in Pr.int- IX lists U English 
foreign language tests. L'ost are clearly intended as 



measures of university entrance- English language 
proficiency. Al'though tests like the I>:ichigan Test of .* 
English Language Proficiency (Division of Testing and 
Certification, English Language -Institute, 1962) might 
J be useful for evaluating upper level-. high school programs 
if appropriate local norms werfe developed (Baldauf, N978), 
tixe tests are generally too specialized and restricted 
to provide ready made solutions to ESL testing problems; 
Except as noted above, they probably are most useful 
as examples of v/hat has and can be done in SSL testing* * - 

Norm-Referenced Tests * - ' , " • 

A wide variety of^ norm-referenced tests are available 
from test publisl:iers. The-^ Gates MacGihitie (Gates and - 

AN 

MacGinitie, 1972), the Stanford Achievement [fest (Madden, 
- Gardner, Rudman, and K^lly," 1964; Baldauf and Reupena", 
1973) and the SRA Achievement tests (SRA, 1973) among 
others have been used extensively in the Pacific region 
. to evaluate -programs. These -tests v/hich v/ere designed 

Jo • 



J 



for first language students studying a stateside curriculum 

1 

■ are too difficult for SSL students v/hfen given at grade 
level. The practice Has been adopted of civing ih^- tests 
at a higher grade level than the one for which 'they were, 
nomed. This practice has had several detriiffental effects? 
In particular, curriculum' content is usually inappropriate 
due to improper test level. There is -also the problem of 
inpropet comparison and- interpretation of the jteat 
results, in which ESL Pacific Island Children are compared 
with inappropriate (in age, grade and curriculum) state- 
side test norms. 

Stateside norm-reference tests do have ^place in 
Pacific island^ programs. Properly used' they can serte 
as k^p^Junselring guide for students wishing to enter 
stateside educational programs,. They are not however. ' ' 
appropriate instruments tcf evaluate ESL programs.. In 
il^ct many norm-j^ferenced tests may not be valid evalu- 
ation. measures for.Title '-I program students. Doherty 
(1977) in a restandardization study of the California 
Achievement Test,- concluded that the use of CTB scores 
led to improbable conclusions about ESAA, students which - 
were rectified when the test scores were rescaled. He 
further suggests that it is likely that these problems 
"will be encountered in the use of any norms' that have 
not been based specifically ,on-disadvantaged, lov/ 
achieving students. (p. 31)^" 

\ 

locally Desi/rned Psychometric Tests • 
. «> 

One of 'the ways to tackle the problem of the lack * 
of appropriate tests would be» to follov/ the 'example of 



America^ Samoa and devel6p psychometric tests based on 

^ '■< ' \ , ' 

loc'al curriculum objetftires. ^ This is a co&plex process. 
The details o€ design of^, the test development program 
!ar§' available in Baldauf and Dunn-Rankin (1973) and' the 
results and sample "tests ire available in "a series "of 
University of.Hav/aii Reports -(i.e. Chin-Chance^, Norton,. 
Rechebei and Bail, 1975). Another example of 'a te^t 
spe'cifically designed for ESL students "at the secondary 
level is the English Structures Te.st (Catling .and' Gob ert 
1973). However, to be useful for program evaluation, • 
these tests would need to have local norms based on § 
representative sample developed for them.. A project is 
cu2?^ently undtoay in Saipan to evaluate and norm tlie 
Marianas Test of English Achievement (LITEA) (JClingbergs 
and Dom, 1977) tcA^provide an appropriai;e SSL test for 
intermediate level ESL students (Baldauf and Anneslej), 
1977).' V/ork in this drea could be simplified by staift^ 
from the Samoan tests already available. 

Experience has shovm that the development of nsyCi 
metricly based tests is an involved pro9ess fegjaii^ng 
considerable outs"i<le exp"B^tise and loc^ 'effort to ^* 
complete. Depending on how it 'is done, it may reauire 
considerable expense and dem^ds expert ESL and evalu-' 
at,ion staff familiar v/ith loca^ conditions to produce 
adequ^e' results. 

Local Norms for Standardized Tests ' * 

Another approach to the 'problem v/ould involve the 
'selec-^ion of the most appropriate norm-referencedv tests 
available for use as they are except that lod^al norms' 



would be developed on a rep pe sent at4.ve samfile basis for 

population to be evaluated. A study of high 'school • 
students in ^Imerican S.aiiioa in which coi^relatxons be.tv/eeh 

r 

Englisji classroopi grades over foui- years and 'LISELP scores 
were examined suggests that ti^is proces^'' may" be a valid 
.'one (Baldauf, 1978). - ' \ 

. It would be much cheaper and, quicjcer' tlian developing 
the psychometrically based tests mentiofied in the. previous 
section. Doherty's (1977 )' Restandard'iz&tion Study 
provides an example of how it could be done?* The 
irritating factor of having obviously inappropriate 
items in the, test, in'"-6erms -pf culture, could be controlled' 

• by not including these items in the raw scores on which 
norms v/ere based. , 

* CLOZS Tests ' , ' 

^ A final approach to developing Valid achievement 
flteasure^ fqr ESL students is' the use of ClOZS tests. 
CliOZE tests, which are based on the concer)t of tke 
student supplying words which have been deleted ^rom 
reading passages, have been successfully used to assess . 
reading ability of SSL students in Papua-IIew Guinea and • ' 
in Singapore (A^iderson, 1976), aAd ?/ith BSL entrants to 
American Universities (Hisama, Lewis and "^'oehlke,' 1977). 
'Baldauf and Propst (1978) have extended the use of CLOZE 
tests to lower Elementary school SSL children. In - 
Australia, two tests b^sed on the CLOZE procedure, the 
GAP Reading Comprehenoi-oli Tests (McLeod, 1967)vfor use' 
with Grades 4-7, and fhd GAPALOi Reading Comprehension 
_^Test, (McLeod and Anderson, 1972) for grades 8 and 9 have 



gained acceptance, ■. " 

•Locally normed CLOZS tests could, he fairly quickly, 
developed and have the advankse of being easy to make 
culturally relevant /.They have the disadvant£;ge hov/ever 
of being rather ..spe.cifically related to reading- achieve- 

« 

Suqinary ' ^ * 

(Dhis paper has briefly reviewed the problems of • 
a?itie' I evaluation in America's Pacific possessions. .. 
It has examined the systems 'approacii to evaluation, the 
weaknesses of the models proposed to evaluate Title I 
programs and has suggested the adaption of a "local 
norms model" for program evaluation. Several approaches^ 
to instrumentation were also diseussed'^as a basis for 
implementing the local norms model. • It is suggested " 
that specific decisions about the best insfi^imentation 
for the Triist Territory can only be decided in consul- 
tation with iQcal educators and must be based 03;i local 
needs, and priorities. 



RS?SRE1TCE IJOTSS 



Anders'on, USOB's '^mi^^I evaluation technicar 

assistan ce efforts; .:.nc/f;roun(l'-ang the svalilation * 
systen , Paper pr-sented ai; the ..nnaai ..eeting pi ' 
the American Educational Research Association^ llew • 
Jork, April 1977. . ' 

Baldauf, R.B.} Jr. and Reupena, £!. ^ A suinmary of . 
Stanford achie^nent test Ve^'ults Tor . av 1S'71 and 
f^tay 1^72 in Sanotin -Llenentary schools . ?a-:o ?-ago; 
Department •or x.ducj:t ion, 1972 (Pacific Collection', 
Sinclair Library, (j.- of Hawaii). . , 

Bessey, B.L., Rosen, L.D. ,- Chiang, A. & 5alliaa(4'e,. G.K. , 
Further -documentation of st€te' ^SliTA Title I-^eT)orting 
models ani -tfheir technical assistance yeauir'enients . 
Mountain View, 'CA: IL'.O iiesearch Corporation; 1*976. 

Ogling, D. and Gobert^ D. English Structures ' Test . 
(Pago Pago: Department of j;ducation, 1573. ■ 

-Chin-Chance, S., Norton,^ R. , Rechebei, E. , and Bail,'F.T. 

Final Report o n Article II (Testin^-^, I^easurei^ent , ahd 
Evaluati on J of the ..niversity of Ha.', ail.- .-nerican 
Samoa .Con tract , rvonolulu : Office of x?oreign* Contracts 
University of , Hawaii, -1975. - 

Doherty, 7/. Restand.ardization' Study . Calif: . Systems 
Development Corporation, 1977. ^ . i 

Gamei; II. Tallmadge', G.K. , )7obd, C.2'. & Briiikiyy, J.L.' 
• State ESEA Title I rfeoorts: Review and analysis of 
past ret)orts. ani" development of^ a nodel reporting 
syster. an d fornat . i:ountain Vi^w CA: Ri.C Research 
Corporation, October, 1975. , 

) * ' 

Gephart, 7/. J. Evaluation reconsidered; Do we need a 

synthesij s? Definitely ! Paper presented at "che ' ' 

Annual Leeting of tn^American Education'.l Research 

Association, Hew York, April 1977. 

Hisama, K-.K., Lewis, E.L. and V/oehlke, P. '- A new direction 

in measuring Drofici^cy in English .as a second ' 

lanr^ua.^e. A paper presented at the Annual I-Ieeting 
of' the American 'Educational Research Association. 
Ne.w York, 1977. • 

House, E.R. Assumptions underlying evaluation models. 



Paper presented at the Annua 



Lie e ting of the Araeri^ea^ 



Educational Research Associa-;!^©^ Kew York, April/l9f^' 

Klingbergs, I. and Dorn^, LlaAanas Test of 'Sngiis'h 
Achieveme'nt . Saipan: }Departti{ent of Education-; 1977. 



r 



Reasarch 4ssocUtfon, Y^^X^f 

'^'^t^Stig;'^;^ ?f ■Ss^g^^HMei.lg^A Title I 

pnn u ~ '^■tm -aeuorx by stem , iuountain ¥i(»"" — 7 ; 

iie search Corporation, fgye, -^^^ -ie>., Ca: 



REPEREWCES 

TstT; IS Dane. u. ox wueensJ.an(i i>re^s, 

Baldauf, R.B.-.'^Jr. cb ])imn-Rankin 'p a 

on a Plan fn r cu-riculnr^fo * * / Progress r^p^^-f. 

074 200). i^^^^^'^eno 01 education, 'I yV :> • ( iJKIC 



ort ; 



liiHIC ED 

''ing[i.hi;^^f'e SofIoL'nj-5"-°' "^^^^an lest, of 

-i^ducationfll 



v.-fr,v, 1. — — ^ -^-i- "jxiuxenc y as a eenpr«ai 




N.L. Gage ^df)r°§S?d°Lorof Reaearch^^'Sfa.. 
Chicago: Ra^ Ivr Tllj, " °" ^^""'"''B 

uuea Dy^i;oiietos Michigan Book Store ).*^ • 



( 



2'. 



Gates, A*. If and llacGinitie, Wjr^ Gates-IJacGinitie Readin/:; 
Tests , liew York: Teachers College Press, ^S12. 

Horst, D..P.'; Tallmadge, G.PC. a Wood, C,T. Measuring 

achievement gains in edUc?tionol project^ > Los Altos, 
• CA:, iUlC Kesearcla Corporation, October 1^74 (UR-243)* 
Also A orac tidal -_xiide to neg,surirJ^ oronect lmr)act on * 
student -iChiever.ent , \ashin£;ton, D.p. ; U.S. 
Government i^rintin^ Office, 1975. (017-0B0-01460-2 )v 

McLaughlin, II. V/. Evaluation and Reform. Cambridge, E^ss*: 
Ballinger Pub. Co., 1975. I 

McLeod, J. GAP Reading^ ComDrehension Test (Forms B, R and 
Manual ) . Lie 1 b ourne : He i n emann , 1967. 

McLeod, J. and Anderson, J. GAPADOL Readin.^ Comprehengion 
Test (^oms G, Y and IlanuaTK South Yarra, Vic : 
Heijiemajin, 1972 . . . ^ 

Madden, R., Gardner, E.P., Rudman, H.G. and Kelly, T.L. 
Stanford iiChieyement gest . Hew York: Hare our t. Brace, 
Jovanovich, 1964.' 

Marco, G.L., L'urphy, 'R.T. , & Quirk, T.J. ^A classification 
of methods of using student data to assess school 
effectiveness. Journal* of Educational Measurement , 
1976, J3, 243-252"; i ' 

jilyers, LI. Uncle gan^s a reading puppeteer: Guess v/ho . 
the -puppet is? Leamin^y , 1975, Nov* ,^ 20-23, 26. 

Rivlin, A.LI. Systernatic* thinking for social action . 
V/ashington, D.O/t Bookings Institute, 1971. 

Science. Research Associates, SRA Achievement Series 
(Llul'tilevdl Edition) . Chicago: Science Research 
Associates, 1973. 

V/orthen, B.R. & Sanders, J.R. Educational evaluation: 
Theory ^d ipractice . Worthington, Ohio: Charles. A. 
Jones, 19750 • 



Richard B. Baldauf, Jr. is Lecturer in Education at 
James Cook University of North Queensland. He has a 
PhD in Educational Psychology from the University of 
Hawaii,,' and has taught English-as-a-second language 
for three years in Sabah, Ualaysia. He was Supervisor 
of Educational Testing for three years in American 
Samoa, and has been involved in Title^ I and Title III 
project evaluations in American Samoa and the Mariana 
Islands. 



